Disfluency in Swedish human–human and human–machine travel booking dialogues
نویسنده
چکیده
Disfluency in Swedish human–human and human–machine travel booking dialogues Abstract This thesis studies disfluency in spontaneous Swedish speech, i.e., the occurrence of hesitation phenomena like eh, öh, truncated words, repetitions and repairs, mispronunciations, truncated words and so on. The thesis is divided into three parts: PART I provides the background, both concerning scientific, personal and industrial– academic aspects in the Tuning in quotes, and the Preamble and Introduction (chapter 1). PART II consists of one chapter only, chapter 2, which dives into the etiology of disfluency. Consequently it describes previous research on disfluencies, also including areas that are not the main focus of the present tome, like stuttering, psychotherapy, philosophy, neurology, discourse perspectives, speech production, application-driven perspectives, cognitive aspects, and so on. A discussion on terminology and definitions is also provided. The goal of this chapter is to provide as broad a picture as possible of the phenomenon of disfluency, and how all those different and varying perspectives are related to each other. PART III describes the linguistic data studied and analyzed in this thesis, with the following structure: Chapter 3 describes how the speech data were collected, and for what reason. Sum totals of the data and the post-processing method are also described. Chapter 4 describes how the data were transcribed, annotated and analyzed. The labeling method is described in detail, as is the method employed to do frequency counts. Chapter 5 presents the analysis and results for all different categories of disfluencies. Besides general frequency and distribution of the different types of disfluencies, both interand intra-corpus results are presented, as are co-occurrences of different types of disfluencies. Also, interand intra-speaker differences are discussed. Chapter 6 discusses the results, mainly in light of previous research. Reasons for the observed frequencies and distribution are proposed, as are their relation to language typology, as well as syntactic, morphological and phonetic reasons for the observed phenomena. Future work is also envisaged, both work that is possible on the present data set, work that is possible on the present data set given extended labeling and work that I think should be carried out, but where the present data set fails, in one way or another, to meet the requirements of such studies. Appendices 1–4 list the sum total of all data analyzed in this thesis (apart from Tok Pisin data). Appendix 5 provides an example of a full human–computer dialogue. Robert Eklund Linköping 2004
منابع مشابه
Crosslinguistic disfluency modelling: a comparative analysis of Swedish and american English human-human and human-machine dialogues
متن کامل
A Comparative Study of Disfluencies in Four Swedish Travel Dialogue Corpora
This paper reports on ongoing work on disfluencies carried out at Telia Research AB. Four travel dialogue corpora are described: human m achine " human (Wizard-of-Oz); human– " machine " (Wizard-of-Oz); human–human and human–machine. The data collection methods are outlined and their possible influence on the collected material is discussed. An annotation scheme for disfluency labelling is...
متن کاملCrosslinguistic disfluency modeling: a comparative analysis of Swedish and tok pisin human-human ATIS dialogues
This paper studies disfluencies in authentic human–human dialogues in Swedish and Tok Pisin. It is found that while there are no major differences as to types or frequencies on a macro level, there are dissimilarities on a micro level, notably in the characteristics of how prolonged segments are realized. The paper also discusses the results in the light of reported disfluencies in English, Ger...
متن کاملA comparison of disfluency distribution in a unimodal and a multimodal speech interface
In this paper, we compare the distribution of disfluencies in two human–computer dialogue corpora. One corpus consists of unimodal travel booking dialogues, which were recorded over the telephone. In this unimodal system, all components except the speech recognition were authentic. The other corpus was collected using a semi-simulated multi-modal dialogue system with an animated talking agent a...
متن کاملA User Simulator for Task-Completion Dialogues
Despite widespread interests in reinforcement-learning for task-oriented dialogue systems, several obstacles can frustrate research and development progress. First, reinforcement learners typically require interaction with the environment, so conventional dialogue corpora cannot be used directly. Second, each task presents specific challenges, requiring separate corpus of task-specific annotate...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004